{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CI/CD for TFX pipelines"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning Objectives\n",
    "\n",
    "1.  Develop a CI/CD workflow with Cloud Build to build and deploy TFX pipeline code.\n",
    "2.  Integrate with Github to automatically trigger pipeline deployment with source code repository changes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, you will walk through authoring a Cloud Build CI/CD workflow that automatically builds and deploys the same TFX pipeline from `lab-02.ipynb`. You will also integrate your workflow with GitHub by setting up a trigger that starts the workflow when a new tag is applied to the GitHub repo hosting the pipeline's code.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml\n",
    "\n",
    "# Set `PATH` to include the directory containing TFX CLI.\n",
    "PATH=%env PATH\n",
    "%env PATH=/home/jupyter/.local/bin:{PATH}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -c \"import tfx; print('TFX version: {}'.format(tfx.__version__))\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: this lab was built and tested with the following package versions:\n",
    "\n",
    "`TFX version: 0.25.0`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(Optional) If the TFX version above does not match the lab tested defaults, run the command below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install --upgrade --user tfx==0.25.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: you may need to restart the kernel to pick up the correct package versions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding the Cloud Build workflow\n",
    "Review the `cloudbuild.yaml` file to understand how the CI/CD workflow is implemented and how environment specific settings are abstracted using **Cloud Build** variables.\n",
    "\n",
    "The **Cloud Build** CI/CD workflow automates the steps you walked through manually during `lab-02`:\n",
    "1. Builds the custom TFX image to be used as a runtime execution environment for TFX components and as the AI Platform Training training container.\n",
    "1. Compiles the pipeline and uploads the pipeline to the KFP environment\n",
    "1. Pushes the custom TFX image to your project's **Container Registry**\n",
    "\n",
    "The **Cloud Build** workflow configuration uses both standard and custom [Cloud Build builders](https://cloud.google.com/cloud-build/docs/cloud-builders). The custom builder encapsulates **TFX CLI**. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuring environment settings\n",
    "\n",
    "Navigate to [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console.\n",
    "\n",
    "### Create or select an existing Kubernetes cluster (GKE) and deploy AI Platform\n",
    "\n",
    "Make sure to select `\"Allow access to the following Cloud APIs https://www.googleapis.com/auth/cloud-platform\"` to allow for programmatic access to your pipeline by the Kubeflow SDK for the rest of the lab. Also, provide an `App instance name` such as \"tfx\" or \"mlops\". Note you may have already deployed an AI Pipelines instance during the Setup for the lab series. If so, you can proceed using that instance below in the next step.\n",
    "\n",
    "Validate the deployment of your AI Platform Pipelines instance in the console before proceeding."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure environment settings\n",
    "\n",
    "Update  the below constants  with the settings reflecting your lab environment. \n",
    "\n",
    "- `GCP_REGION` - the compute region for AI Platform Training and Prediction\n",
    "- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name starts with the `kubeflowpipelines-` prefix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use the following command to identify the GCS bucket for metadata and pipeline storage.\n",
    "!gsutil ls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* `CUSTOM_SERVICE_ACCOUNT` - In the gcp console Click on the Navigation Menu and navigate to `IAM & Admin`, then to `Service Accounts` and use the service account starting with prefix - 'tfx-tuner-caip-service-account'. This enables CloudTuner and the Google Cloud AI Platform extensions Tuner component to work together and allows for distributed and parallel tuning backed by AI Platform Vizier's hyperparameter search algorithm. Please see the lab setup `README` for setup instructions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `ENDPOINT` - set the `ENDPOINT` constant to the endpoint to your AI Platform Pipelines instance. The endpoint to the AI Platform Pipelines instance can be found on the [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console. Open the *SETTINGS* for your instance and use the value of the `host` variable in the *Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD* section of the *SETTINGS* window. The format is `'....[region].pipelines.googleusercontent.com'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#TODO: Set your environment resource settings here for GCP_REGION, ARTIFACT_STORE_URI, ENDPOINT, and CUSTOM_SERVICE_ACCOUNT. \n",
    "\n",
    "GCP_REGION = ''\n",
    "ARTIFACT_STORE_URI = ''\n",
    "CUSTOM_SERVICE_ACCOUNT = ''\n",
    "ENDPOINT = ''\n",
    "\n",
    "PROJECT_ID = !(gcloud config get-value core/project)\n",
    "PROJECT_ID = PROJECT_ID[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating the TFX CLI builder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Review the Dockerfile for the TFX CLI builder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat tfx-cli/Dockerfile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat tfx-cli/requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build the image and push it to your project's Container Registry"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Hint**: Review the [Cloud Build](https://cloud.google.com/cloud-build/docs/running-builds/start-build-manually#gcloud) gcloud command line reference for builds submit. Your image should follow the format `gcr.io/[PROJECT_ID]/[IMAGE_NAME]:latest`. Note the source code for the tfx-cli is in the directory `./tfx-cli`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "IMAGE_NAME='tfx-cli'\n",
    "TAG='latest'\n",
    "IMAGE_URI='gcr.io/{}/{}:{}'.format(PROJECT_ID, IMAGE_NAME, TAG)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Your gcloud command here to build tfx-cli and submit to Container Registry.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise: manually trigger CI/CD pipeline run with Cloud Build\n",
    "\n",
    "You can manually trigger **Cloud Build** runs using the `gcloud builds submit` command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PIPELINE_NAME='tfx_covertype_continuous_training'\n",
    "MODEL_NAME='tfx_covertype_classifier'\n",
    "DATA_ROOT_URI='gs://workshop-datasets/covertype/small'\n",
    "TAG_NAME='test'\n",
    "TFX_IMAGE_NAME='lab-03-tfx-image'\n",
    "PIPELINE_FOLDER='pipeline'\n",
    "PIPELINE_DSL='runner.py'\n",
    "RUNTIME_VERSION='2.3'\n",
    "PYTHON_VERSION='3.7'\n",
    "USE_KFP_SA='False'\n",
    "ENABLE_TUNING='True'\n",
    "\n",
    "SUBSTITUTIONS=\"\"\"\n",
    "_GCP_REGION={},\\\n",
    "_ARTIFACT_STORE_URI={},\\\n",
    "_CUSTOM_SERVICE_ACCOUNT={},\\\n",
    "_ENDPOINT={},\\\n",
    "_PIPELINE_NAME={},\\\n",
    "_MODEL_NAME={},\\\n",
    "_DATA_ROOT_URI={},\\\n",
    "_TFX_IMAGE_NAME={},\\\n",
    "TAG_NAME={},\\\n",
    "_PIPELINE_FOLDER={},\\\n",
    "_PIPELINE_DSL={},\\\n",
    "_RUNTIME_VERSION={},\\\n",
    "_PYTHON_VERSION={},\\\n",
    "_USE_KFP_SA={},\\\n",
    "_ENABLE_TUNING={},\n",
    "\"\"\".format(GCP_REGION, \n",
    "           ARTIFACT_STORE_URI,\n",
    "           CUSTOM_SERVICE_ACCOUNT,\n",
    "           ENDPOINT,\n",
    "           PIPELINE_NAME,\n",
    "           MODEL_NAME,\n",
    "           DATA_ROOT_URI,\n",
    "           TFX_IMAGE_NAME,\n",
    "           TAG_NAME, \n",
    "           PIPELINE_FOLDER,\n",
    "           PIPELINE_DSL,\n",
    "           RUNTIME_VERSION,\n",
    "           PYTHON_VERSION,\n",
    "           USE_KFP_SA,\n",
    "           ENABLE_TUNING\n",
    "           ).strip()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hint: you can manually trigger **Cloud Build** runs using the `gcloud builds submit` command. See the [documentation](https://cloud.google.com/sdk/gcloud/reference/builds/submit) for pass the `cloudbuild.yaml` file and SUBSTITIONS as arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: write gcloud builds submit command to trigger manual pipeline run.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise: Setting up GitHub integration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this exercise you integrate your CI/CD workflow with **GitHub**, using [Cloud Build GitHub App](https://github.com/marketplace/google-cloud-build). \n",
    "You will set up a trigger that starts the CI/CD workflow when a new tag is applied to the **GitHub** repo managing the  pipeline source code. You will use a fork of this repo as your source GitHub repository."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a fork of this repo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### [Follow the GitHub documentation](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) to fork this repo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a Cloud Build trigger\n",
    "\n",
    "Connect the fork you created in the previous step to your Google Cloud project and create a trigger following the steps in the [Creating GitHub app trigger](https://cloud.google.com/cloud-build/docs/create-github-app-triggers) article. Use the following values on the **Edit trigger** form:\n",
    "\n",
    "|Field|Value|\n",
    "|-----|-----|\n",
    "|Name|[YOUR TRIGGER NAME]|\n",
    "|Description|[YOUR TRIGGER DESCRIPTION]|\n",
    "|Event| Tag|\n",
    "|Source| [YOUR FORK]|\n",
    "|Tag (regex)|.\\*|\n",
    "|Build Configuration|Cloud Build configuration file (yaml or json)|\n",
    "|Cloud Build configuration file location|/ workshops/tfx-caip-tf23/lab-03-tfx-cicd/labs/cloudbuild.yaml|\n",
    "\n",
    "\n",
    "Use the following values for the substitution variables:\n",
    "\n",
    "|Variable|Value|\n",
    "|--------|-----|\n",
    "|_GCP_REGION|[YOUR GCP_REGION]|\n",
    "|_CUSTOM_SERVICE_ACCOUNT|[YOUR CUSTOM_SERVICE_ACCOUNT]|\n",
    "|_ENDPOINT|[Your inverting proxy host pipeline ENDPOINT]|\n",
    "|_TFX_IMAGE_NAME|lab-03-tfx-image|\n",
    "|_PIPELINE_NAME|tfx_covertype_continuous_training|\n",
    "|_MODEL_NAME|tfx_covertype_classifier|\n",
    "|_DATA_ROOT_URI|gs://workshop-datasets/covertype/small|\n",
    "|_PIPELINE_FOLDER|workshops/tfx-caip-tf23/lab-03-tfx-cicd/labs/pipeline|\n",
    "|_PIPELINE_DSL|runner.py|\n",
    "|_PYTHON_VERSION|3.7|\n",
    "|_RUNTIME_VERSION|2.3|\n",
    "|_USE_KFP_SA|False|\n",
    "|_ENABLE_TUNING|True|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Trigger the build\n",
    "\n",
    "To start an automated build [create a new release of the repo in GitHub](https://help.github.com/en/github/administering-a-repository/creating-releases). Alternatively, you can start the build by applying a tag using `git`. \n",
    "```\n",
    "git tag [TAG NAME]\n",
    "git push origin --tags\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Verify triggered build in Cloud Build dashboard\n",
    "\n",
    "After you see the pipeline finish building on the Cloud Build dashboard, return to [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) in the console. Click `OPEN PIPELINES DASHBOARD` and view the newly deployed pipeline. Creating a release tag on GitHub will create a pipeline with the name `tfx_covertype_continuous_training-[TAG NAME]` while doing so from GitHub will create a pipeline with the name `tfx_covertype_continuous_training_github-[TAG NAME]`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Steps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, you walked through authoring a Cloud Build CI/CD workflow that automatically builds and deploys a TFX pipeline. You also integrated your TFX workflow with GitHub by setting up a Cloud Build trigger. In the next lab, you will walk through inspection of TFX metadata and pipeline artifacts created during TFX pipeline runs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# License"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font size=-1>Licensed under the Apache License, Version 2.0 (the \\\"License\\\");\n",
    "you may not use this file except in compliance with the License.\n",
    "You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)\n",
    "\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \\\"AS IS\\\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>"
   ]
  }
 ],
 "metadata": {
  "environment": {
   "name": "tf2-2-3-gpu.2-3.m59",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-2-3-gpu.2-3:m59"
  },
  "kernelspec": {
   "display_name": "Python [conda env:root] *",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
